785 research outputs found

    Supervised Random Walks: Predicting and Recommending Links in Social Networks

    Full text link
    Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms state-of-the-art unsupervised approaches as well as approaches that are based on feature extraction

    BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking

    Full text link
    Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data. Specifically, big data generators need to generate scalable data (Volume) of different types (Variety) under controllable generation rates (Velocity) while keeping the important characteristics of raw data (Veracity). This gives rise to various new challenges about how we design generators efficiently and successfully. To date, most existing techniques can only generate limited types of data and support specific big data systems such as Hadoop. Hence we develop a tool, called Big Data Generator Suite (BDGS), to efficiently generate scalable big data while employing data models derived from real data to preserve data veracity. The effectiveness of BDGS is demonstrated by developing six data generators covering three representative data types (structured, semi-structured and unstructured) and three data sources (text, graph, and table data)

    Time-Varying Graphs and Dynamic Networks

    Full text link
    The past few years have seen intensive research efforts carried out in some apparently unrelated areas of dynamic systems -- delay-tolerant networks, opportunistic-mobility networks, social networks -- obtaining closely related insights. Indeed, the concepts discovered in these investigations can be viewed as parts of the same conceptual universe; and the formal models proposed so far to express some specific concepts are components of a larger formal description of this universe. The main contribution of this paper is to integrate the vast collection of concepts, formalisms, and results found in the literature into a unified framework, which we call TVG (for time-varying graphs). Using this framework, it is possible to express directly in the same formalism not only the concepts common to all those different areas, but also those specific to each. Based on this definitional work, employing both existing results and original observations, we present a hierarchical classification of TVGs; each class corresponds to a significant property examined in the distributed computing literature. We then examine how TVGs can be used to study the evolution of network properties, and propose different techniques, depending on whether the indicators for these properties are a-temporal (as in the majority of existing studies) or temporal. Finally, we briefly discuss the introduction of randomness in TVGs.Comment: A short version appeared in ADHOC-NOW'11. This version is to be published in Internation Journal of Parallel, Emergent and Distributed System

    Stable and Efficient Structures for the Content Production and Consumption in Information Communities

    Full text link
    Real-world information communities exhibit inherent structures that characterize a system that is stable and efficient for content production and consumption. In this paper, we study such structures through mathematical modelling and analysis. We formulate a generic model of a community in which each member decides how they allocate their time between content production and consumption with the objective of maximizing their individual reward. We define the community system as "stable and efficient" when a Nash equilibrium is reached while the social welfare of the community is maximized. We investigate the conditions for forming a stable and efficient community under two variations of the model representing different internal relational structures of the community. Our analysis results show that the structure with "a small core of celebrity producers" is the optimally stable and efficient for a community. These analysis results provide possible explanations to the sociological observations such as "the Law of the Few" and also provide insights into how to effectively build and maintain the structure of information communities.Comment: 21 page

    Kronecker Graphs: An Approach to Modeling Networks

    Full text link
    How can we model networks with a mathematically tractable model that allows for rigorous analysis of network properties? Networks exhibit a long list of surprising properties: heavy tails for the degree distribution; small diameters; and densification and shrinking diameters over time. Most present network models either fail to match several of the above properties, are complicated to analyze mathematically, or both. In this paper we propose a generative model for networks that is both mathematically tractable and can generate networks that have the above mentioned properties. Our main idea is to use the Kronecker product to generate graphs that we refer to as "Kronecker graphs". First, we prove that Kronecker graphs naturally obey common network properties. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present KronFit, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super- exponential time. In contrast, KronFit takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on large real and synthetic networks show that KronFit finds accurate parameters that indeed very well mimic the properties of target networks. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synthetic graphs can be used for null- models, anonymization, extrapolations, and graph summarization

    Analysis of the Web Graph Aggregated by Host and Pay-Level Domain

    Full text link
    In this paper the web is analyzed as a graph aggregated by host and pay-level domain (PLD). The web graph datasets, publicly available, have been released by the Common Crawl Foundation and are based on a web crawl performed during the period May-June-July 2017. The host graph has \sim1.3 billion nodes and \sim5.3 billion arcs. The PLD graph has \sim91 million nodes and \sim1.1 billion arcs. We study the distributions of degree and sizes of strongly/weakly connected components (SCC/WCC) focusing on power laws detection using statistical methods. The statistical plausibility of the power law model is compared with that of several alternative distributions. While there is no evidence of power law tails on host level, they emerge on PLD aggregation for indegree, SCC and WCC size distributions. Finally, we analyze distance-related features by studying the cumulative distributions of the shortest path lengths, and give an estimation of the diameters of the graphs

    Shortest path discovery of complex networks

    Get PDF
    In this paper we present an analytic study of sampled networks in the case of some important shortest-path sampling models. We present analytic formulas for the probability of edge discovery in the case of an evolving and a static network model. We also show that the number of discovered edges in a finite network scales much slower than predicted by earlier mean field models. Finally, we calculate the degree distribution of sampled networks, and we demonstrate that they are analogous to a destructed network obtained by randomly removing edges from the original network.Comment: 10 pages, 4 figure

    Co-community Structure in Time-varying Networks

    Full text link
    In this report, we introduce the concept of co-community structure in time-varying networks. We propose a novel optimization algorithm to rapidly detect co-community structure in these networks. Both theoretical and numerical results show that the proposed method not only can resolve detailed co-communities, but also can effectively identify the dynamical phenomena in these networks.Comment: 5 pages, 6 figure

    The Dynamics of Viral Marketing

    Full text link
    We present an analysis of a person-to-person recommendation network, consisting of 4 million people who made 16 million recommendations on half a million products. We observe the propagation of recommendations and the cascade sizes, which we explain by a simple stochastic model. We analyze how user behavior varies within user communities defined by a recommendation network. Product purchases follow a 'long tail' where a significant share of purchases belongs to rarely sold items. We establish how the recommendation network grows over time and how effective it is from the viewpoint of the sender and receiver of the recommendations. While on average recommendations are not very effective at inducing purchases and do not spread very far, we present a model that successfully identifies communities, product and pricing categories for which viral marketing seems to be very effective
    corecore